Note to professor

I have used R (not Stata) for tihs assignment for two reasons: (i) I don’t have a Stata license (I am not part of the Master’s programme) and (ii) I prefer to use open-source software whenever possible (David 2012). Please let me know if this is a problem.

Assignment description

Use the Stata dataset called “panamadata.dta” that is provided in the Week 3 subdirectory in Canvas. This contains some of the variables from the Panama Living Standards Measurement Survey, 2008. Questions are as follows.

Homework

Section 0

Data preparation and preliminary exploration

The below section shows the steps taken to read, process, and prepare the data for analysis.

# Attach libraries
library(readstata13) # for reading stata files
library(dplyr) # for dataframe manipulation
library(ggplot2) # for generating charts
library(databrew) # for making chart theme consistent; devtools::install_github('databrew', 'databrew')
library(DT) # for dynamic html tables
library(raster) # for pixel-based GIS
library(sp) # for polygon-based GI
library(rgeos); library(maptools) # more spatial tools

# Read the dta file
df <- read.dta13('panamadata.dta')

# Get a shapefile of Panama
pan <- getData(name = 'GADM', country = 'PAN', level = 1)
# Clean up the shapefile so that it's joinable with df
df$prov <- gsub('comarca ', '', df$prov)
pan@data$prov <- tolower(pan@data$NAME_1)
pan@data$prov[pan@data$prov == 'ngöbe buglé'] <- 'ngöbe bugle'
# Assume that panama oeste is the same as panama
pan@data$prov[pan@data$prov == 'panamá oeste'] <- 'panamá'
# Create an area
pan@data <-
  left_join(x = pan@data,
            y = df %>% group_by(prov) %>% summarise(area = dplyr::first(area)),
            by = 'prov')

# Take a look at our plot
plot(pan)

# Make a ggplot compatible version of our plot too
pan_fortified <- broom::tidy(pan, region = 'prov')
pan_fortified <-
  left_join(x = pan_fortified %>% mutate(prov = id),
            y = df %>% group_by(prov) %>% summarise(area = dplyr::first(area)),
            by = 'prov')

# Define function for mapping
mapper <- function(map = pan,
                   data = x,
                   tile = 'Stamen.Toner',
                  palette = 'YlOrRd',
                  show_legend = TRUE){

  map@data <- map@data %>% left_join(data, by = names(x)[1])
  require(dplyr)
    require(leaflet)
    require(RColorBrewer)
  
  bins <- sort(unique(round(c(c(0, quantile(map@data$percent, seq(0, 1, by = 0.1), na.rm = TRUE), max(map@data$percent, na.rm = TRUE) * 1.1)))))
  pal <- colorBin(palette, domain = map@data$percent, bins = bins)
  map <- map[!is.na(map@data$percent),]
  
  popper <- paste0(map@data$prov, ': ', map@data$percent)
  l <- leaflet(data = map) %>%
      addProviderTiles(tile)
  if(show_legend){
    l <- l %>%
      addLegend(pal = pal, values = ~percent, opacity = 0.7, title = NULL,
                position = "bottomleft")
    }
    l <- l %>%
      addPolygons(fillColor = ~pal(percent),
      fillOpacity = 0.8,
      color = "#BDBDC3",
      weight = 1,
      # popup = popper,
      highlight = highlightOptions(
      weight = 5,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.7,
      bringToFront = TRUE),
      label = popper,
      labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "15px",
      direction = "auto"))
  return(l)
}

# Calculate poverty gaps
df$poverty_gap <- df$cons_pc / df$povline
df$extreme_poverty_gap <- df$cons_pc / df$xpovline

# Define whether poor or not
df$poor <- df$poverty_gap < 1

# Arrange dataset by poverty
df <- df %>% arrange(poverty_gap) %>% 
  # Calculate order
  mutate(id = 1:nrow(df)) %>%
  # Calculate percentile
  mutate(p = id / max(id) * 100) %>%
  # Calculate cumulative amounts and people
  mutate(cum_people = cumsum(hhsize),
         cum_consumption = cumsum(consumption))
  

# Plot the poverty incidence curve
ggplot(data = df, aes(x = cons_pc, 
                      y = cum_consumption / 
                        max(cum_consumption) * 100)) +
  geom_area(alpha = 0.6,
            fill = 'darkorange') +
  geom_line(lty = 2) +
  theme_databrew() +
  labs(x = 'Consumption per capita',
       y = '% of population',
       title = 'Poverty incidence curve',
       subtitle = 'With first-order stochastic dominance') +
  ylim(0, 100)

Section 1

Using the data file on Panama, compute the following: headcount, poverty gap, squared poverty gap…

  1. at the national level
  2. at the area level (rural, urban, indigenous)
  3. at the provincial level

Produce calculations using the ordinary as well as the extreme poverty lines.

Headcount

At the national level

# a) at the national level
sum(df$hhsize[df$poor]) # actual count
[1] 11141
sum(df$hhsize[df$poor]) / sum(df$hhsize) * 100 # percent
[1] 41.01686

At the area level

# b) at the area level (rural, urban, indigenous)
x <- df %>% 
  group_by(area) %>% 
  summarise(headcount_poor = sum(hhsize[poor]),
            population = sum(hhsize)) %>%
  mutate(percent = headcount_poor / population * 100)
# Table
x
area headcount_poor population percent
urbana 2463 13499 18.24580
rural 5766 10625 54.26824
indígena 2912 3038 95.85253
# Chart
ggplot(data = x,
       aes(x = area,
           y = percent)) +
  geom_bar(stat = 'identity',
           fill = 'darkorange',
           alpha = 0.6) + 
  geom_label(aes(label = paste0(round(percent, digits = 2), '%'))) +
  theme_databrew() +
  labs(x = 'Area',
       y = 'Poverty headcount',
       title = 'Poverty headcount by area')

# Map
mapper(tile = 'Stamen.Toner', data = x)

At the provincial level

# c) at the provincial level 
x <- df %>% 
  group_by(prov) %>% 
  summarise(headcount_poor = sum(hhsize[poor]),
            population = sum(hhsize)) %>%
  mutate(percent = headcount_poor / population * 100)
# Table
x
prov headcount_poor population percent
bocas del toro 1617 2762 58.54453
chiriquí 650 2001 32.48376
coclé 885 1684 52.55344
colón 405 1584 25.56818
darién 961 1678 57.27056
emberá 124 142 87.32394
herrera 522 1533 34.05088
kuna yala 467 512 91.21094
los santos 395 1262 31.29952
ngöbe bugle 1752 1789 97.93181
panamá 2460 10556 23.30428
veraguas 903 1659 54.43038
# Chart
ggplot(data = x,
       aes(x = prov,
           y = percent)) +
  geom_bar(stat = 'identity',
           fill = 'darkorange',
           alpha = 0.6) + 
  geom_label(aes(label = paste0(round(percent, digits = 2), '%'))) +
  theme_databrew() +
  labs(x = 'Province',
       y = 'Poverty headcount',
       title = 'Poverty headcount by province') +
  theme(axis.text.x = element_text(angle = 90))

# Map
mapper(tile = 'Stamen.Terrain', data = x)

Poverty gap

At the national level

# Formula for poverty gap index is (1 / total population) * sum((poverty line - consumption)/ poverty line)

# a) at the national level
weighted.mean(df$poverty_gap, w = df$hhsize)
[1] 1.843468

At the area level

# b) at the area level (rural, urban, indigenous)
x <- df %>% 
  group_by(area) %>% 
  summarise(poverty_gap = weighted.mean(poverty_gap, w = hhsize)) 
# Table
x
area poverty_gap
urbana 2.6229940
rural 1.2771785
indígena 0.3602598
# Chart
ggplot(data = x,
       aes(x = area,
           y = poverty_gap)) +
  geom_bar(stat = 'identity',
           fill = 'darkorange',
           alpha = 0.6) + 
  geom_label(aes(label = paste0(round(poverty_gap, digits = 2), '%'))) +
  theme_databrew() +
  labs(x = 'Area',
       y = 'Poverty gap',
       title = 'Poverty gap index by area')

# Map
mapper(tile = 'Esri.DeLorme', data = x %>% mutate(percent = poverty_gap), palette = 'Purples')

At the provincial level

# c) at the provincial level 
x <- df %>% 
  group_by(prov) %>% 
  summarise(poverty_gap = weighted.mean(poverty_gap, w = hhsize)) 
# Table
x
prov poverty_gap
bocas del toro 1.3887111
chiriquí 1.8711549
coclé 1.3392145
colón 2.0718112
darién 1.1300094
emberá 0.6430642
herrera 1.8512499
kuna yala 0.5552103
los santos 2.0322630
ngöbe bugle 0.2640200
panamá 2.5292465
veraguas 1.2718633
# Chart
ggplot(data = x,
       aes(x = prov,
           y = poverty_gap)) +
  geom_bar(stat = 'identity',
           fill = 'darkorange',
           alpha = 0.6) + 
  geom_label(aes(label = paste0(round(poverty_gap, digits = 2), '%'))) +
  theme_databrew() +
  labs(x = 'Province',
       y = 'Poverty gap',
       title = 'Poverty gap index by province') +
  theme(axis.text.x = element_text(angle = 90))

# Map
mapper(tile = 'Stamen.Watercolor', data = x %>% mutate(percent = poverty_gap), palette = 'Greens')

Squared poverty gap

# a) at the national level
weighted.mean(df$poverty_gap, w = df$hhsize) ^2
[1] 3.398376

At the area level

# b) at the area level (rural, urban, indigenous)
x <- df %>% 
  group_by(area) %>% 
  summarise(poverty_gap = weighted.mean(poverty_gap, w = hhsize)^2) 
# Table
x
area poverty_gap
urbana 6.8800978
rural 1.6311848
indígena 0.1297871
# Chart
ggplot(data = x,
       aes(x = area,
           y = poverty_gap)) +
  geom_bar(stat = 'identity',
           fill = 'darkorange',
           alpha = 0.6) + 
  geom_label(aes(label = paste0(round(poverty_gap, digits = 2), '%'))) +
  theme_databrew() +
  labs(x = 'Area',
       y = 'Squared poverty gap',
       title = 'Squared poverty gap index by area')

# Map
mapper(tile = 'Esri.WorldImagery', data = x %>% mutate(percent = poverty_gap), palette = 'Greens')

At the provincial level

# c) at the provincial level 
x <- df %>% 
  group_by(prov) %>% 
  summarise(poverty_gap = weighted.mean(poverty_gap, w = hhsize)^2) 
# Table
x
prov poverty_gap
bocas del toro 1.9285186
chiriquí 3.5012205
coclé 1.7934954
colón 4.2924016
darién 1.2769212
emberá 0.4135316
herrera 3.4271263
kuna yala 0.3082585
los santos 4.1300930
ngöbe bugle 0.0697065
panamá 6.3970880
veraguas 1.6176362
# Chart
ggplot(data = x,
       aes(x = prov,
           y = poverty_gap)) +
  geom_bar(stat = 'identity',
           fill = 'darkorange',
           alpha = 0.6) + 
  geom_label(aes(label = paste0(round(poverty_gap, digits = 2), '%'))) +
  theme_databrew() +
  labs(x = 'Province',
       y = 'Squared poverty gap',
       title = 'Squared poverty gap index by province') +
  theme(axis.text.x = element_text(angle = 90))

# Map
mapper(tile = 'Esri.WorldPhysical', data = x %>% mutate(percent = poverty_gap), palette = 'Oranges')

Additional question

Report the precision of your poverty estimates. (How can you take into account that the sample is ‘clustered’: clusters have been sampled randomly, and within clusters households have been sampled randomly?)

Given the clustering, we should use robust standard errors.

Section 2

Using the data file on Panama calculate the between and with-in group contribution to per capita consumption inequality from the following population groups (based on the General Entropy inequality measures with parameter values 0, 1, and 2).

Note to professor: I use the IC2 package for these decomposition calculations (Plat 2012).

Province

library(IC2)
df$prov <- factor(df$prov)

for (i in 0:2){
  x <- decompAtkinson(x = df$cons_pc, 
                    z = df$prov,
                    w = df$hhsize, 
                    epsilon = i, 
                    decomp = "BDA", 
                    ELMO = TRUE)
  cat(paste0('#### Epsilon: ', i, '   -----------------------------------------------------------\n'))
  summary(x)
}
#### Epsilon: 0   -----------------------------------------------------------
       Atk           epsilon        
1.1102e-16        0.0000e+00        

Decomposition (BDA):
     within      between        cross  
-2.2204e-16   0.0000e+00   0.0000e+00  
betweenELMO  
 1.1102e-16  
#### Epsilon: 1   -----------------------------------------------------------
    Atk        epsilon        
0.36133        1.00000        

Decomposition (BDA):
  within   between     cross  
0.278419  0.114904  0.031991  
betweenELMO  
    0.31409  
#### Epsilon: 2   -----------------------------------------------------------
    Atk        epsilon        
0.62464        2.00000        

Decomposition (BDA):
 within  between    cross  
0.45453  0.31185  0.14175  
betweenELMO  
    0.58025  

Area (rural, urban, indigenous)

df$area <- factor(df$area)
for (i in 0:2){
  x <- decompAtkinson(x = df$cons_pc, 
                    z = df$area,
                    w = df$hhsize, 
                    epsilon = i, 
                    decomp = "BDA", 
                    ELMO = TRUE)
  cat(paste0('#### Epsilon: ', i, '  -----------------------------------------------------------\n'))
  summary(x)
}
#### Epsilon: 0  -----------------------------------------------------------
       Atk           epsilon        
1.1102e-16        0.0000e+00        

Decomposition (BDA):
     within      between        cross  
-2.2204e-16   0.0000e+00   0.0000e+00  
betweenELMO  
 1.1102e-16  
#### Epsilon: 1  -----------------------------------------------------------
    Atk        epsilon        
0.36133        1.00000        

Decomposition (BDA):
  within   between     cross  
0.257313  0.140057  0.036038  
betweenELMO  
    0.27511  
#### Epsilon: 2  -----------------------------------------------------------
    Atk        epsilon        
0.62464        2.00000        

Decomposition (BDA):
 within  between    cross  
0.44212  0.32715  0.14464  
betweenELMO  
    0.53336  

Family size category

df$family_size <- cut(df$hhsize, c(0, 3, 6, 10, 30))
df$family_size <- factor(df$family_size)
for (i in 0:2){
  x <- decompAtkinson(x = df$cons_pc, 
                    z = df$family_size,
                    w = df$hhsize, 
                    epsilon = i, 
                    decomp = "BDA", 
                    ELMO = TRUE)
  cat(paste0('#### Epsilon: ', i, '  -----------------------------------------------------------\n'))
  summary(x)
}
#### Epsilon: 0  -----------------------------------------------------------
       Atk           epsilon        
1.1102e-16        0.0000e+00        

Decomposition (BDA):
    within     between       cross  
0.0000e+00  1.1102e-16  0.0000e+00  
betweenELMO  
          0  
#### Epsilon: 1  -----------------------------------------------------------
    Atk        epsilon        
0.36133        1.00000        

Decomposition (BDA):
  within   between     cross  
0.271615  0.123171  0.033455  
betweenELMO  
    0.31104  
#### Epsilon: 2  -----------------------------------------------------------
    Atk        epsilon        
0.62464        2.00000        

Decomposition (BDA):
 within  between    cross  
0.49047  0.26331  0.12915  
betweenELMO  
    0.56481  

The assignment is to be turned in by 17:00 on Wednesday, November 22nd. Make sure to include your name. The assignment is to be sent by email to: p.f.lanjouw@vu.nl

Data dictionary

The below is the data dictionary, as provided in the assignment description.

variable_name variable_label
Psu primary sampling unit
Household household id
Questionnaire questionnaire number
Prov Province
District District
Region Region
Area area (urban/rural/indigenous)
Consumption total household consumption in prices june 08
cons_pc consumption per capita in prices june 08
Hhsize household size
Xpovline extreme poverty line
Povline poverty line
Indweight individual weight
Hhweight household weight

Bibliography

David, Steele Robert. 2012. The Open-Source Everything Manifesto. North Atlantic Books. http://realitysandwich.com/151036/open_source_everything_manifesto/.

Plat, Didier. 2012. IC2: Inequality and Concentration Indices and Curves. https://CRAN.R-project.org/package=IC2.